Robustness against background noise is a major research area for speech-related applications such as speech\nrecognition and speaker recognition. One of the many solutions for this problem is to detect speech-dominant\nregions by using a voice activity detector (VAD). In this paper, a second-order polynomial regression-based\nalgorithm is proposed with a similar function as a VAD for text-independent speaker verification systems. The\nproposed method aims to separate steady noise/silence regions, steady speech regions, and speech onset/offset\nregions. The regression is applied independently to each filter band of a mel spectrum, which makes the algorithm\nfit seamlessly to the conventional extraction process of the mel-frequency cepstral coefficients (MFCCs). The kmeans\nalgorithm is also applied to estimate average noise energy in each band for spectral subtraction. A pseudo\nSNR-dependent linear thresholding for the final VAD output decision is introduced based on the k-means energy\ncenters. This thresholding considers the speech presence in each band. Conventional VADs usually neglect the\ndeteriorative effects of the additive noise in the speech regions. Contrary to this, the proposed method decides not\nonly for the speech presence, but also if the frame is dominated by the speech, or the noise. Performance of the\nproposed algorithm is compared with a continuous noise tracking method, and another VAD method in speaker\nverification experiments, where five different noise types at five different SNR levels were considered. The proposed\nalgorithm showed superior verification performance both with the conventional GMM-UBM method, and the stateof-\nthe-art i-vector method.
Loading....